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Dynamic Shannon Coding 

Travis Gagie, Student Member, IEEE 


Abstract —We present a new algorithm for dynamic prefix- 
free coding, based on Shannon coding. We give a simple analysis 
and prove a better upper bound on the length of the encoding 
produced than the corresponding bound for dynamic Huffman 
coding. We show how our algorithm can be modified for efficient 
length-restricted coding, alphabetic coding and coding with 
unequal letter costs. 

Index Terms —Data compression, length-restricted codes, al¬ 
phabetic codes, codes with unequal letter costs. 


I. Introduction 


Prefix-free coding is a well-studied problem in data com¬ 
pression and combinatorial optimization. For this problem, we 
are given a string S = s± ■ ■ ■ s m drawn from an alphabet of 
size n and must encode each character by a self-delimiting 
binary codeword. Our goal is to minimize the length of the 
entire encoding of S. For static prefix-free coding, we are 
given all of S before we start encoding and must encode 
every occurrence of the same character by the same codeword. 
The assignment of codewords to characters is recorded as 
a preface to the encoding. For dynamic prefix-free coding, 
we are given S character by character and must encode each 
character before receiving the next one. We can use a different 
codeword for different occurrences of the same character, we 
do not need a preface to the encoding and the assignment of 
codewords to characters cannot depend on the suffix of S not 
yet encoded. 

The best-known algorithms for static coding are by 
Shannon [1] and Huffman [2]. Shannon’s algorithm uses at 
most (H + 1 )ro + O(nlogn) bits to encode S, where 


// = V log 

' ^ m 


aes 


m 


#a(S) 


is the empirical entropy of S and ff a (S) is the number of 
occurrences of the character a in S. By log we mean log 2 . 
Shannon proved a lower bound of Hm bits for all coding 
algorithms, whether or not they are prefix-free. Huffman’s 
algorithm produces an encoding that, excluding the preface, 
has minimum length. The total length is (H+r)m+0(n log n) 
bits, where 0 < r < 1 is a function of the character frequencies 
in S [3], 

Both algorithms assign codewords to characters by con¬ 
structing a code-tree, that is, a binary tree whose left and 
right edges are labelled by 0’s and l’s, respectively, and 
whose leaves are labelled by the distinct characters in S. The 
codeword assigned to a character a in S' is the sequence of 
edge labels on the path from the root to the leaf labelled a. 
Shannon’s algorithm builds a code-tree in which, for a £ S, 
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the leaf labelled a is of depth at most [log(TO/# a (S))]. 
Huffman’s algorithm builds a Huffman tree for the frequencies 
of the characters in S. A Huffman tree for a sequence of 
weights wi,... ,w n is a binary tree whose leaves, in some 
order, have weights wi,... ,w n and that, among all such 
trees, minimizes the weighted external path length. To build 
a Huffman tree for wi,... ,w n , we start with n trees, each 
consisting of just a root. At each step, we make the two roots 
with smallest weights, Wi and Wj, into the children of a new 
root with weight Wi + Wj. 

A minimax tree for a sequence of weights w\,...,w n 
is a binary tree whose leaves, in some order, have weights 
Wi,... ,w n and that, among all such trees, minimizes the 
maximum sum of any leaf’s weight and depth. Golumbic [4] 
gave an algorithm, similar to Huffman’s, for constructing a 
minimax tree. The difference is that, when we make the two 
roots with smallest weights, Wi and Wj , into the children of a 
new root, that new root has weight maxfwj, Wj) +1 instead of 
Wi+Wj. Notice that, if there exists a binary tree whose leaves, 
in some order, have depths di,.... d n , then a minimax tree 
T for —di ,..., —d n is such a tree and, more generally, the 
depth of each node in T is bounded above by the negative 
of its weight. So we can construct a code-tree for Shannon’s 
algorithm by running Golumbic’s algorithm, starting with roots 
labelled by the distinct characters in S, with the root labelled 
a having weight - \\og(m/# a (S ))]. 

Both Shannon’s algorithm and Huffman’s algorithm have 
three phases: a first pass over S to count the occurrences of 
each distinct character, an assignment of codewords to the 
distinct characters in S (recorded as a preface to the encoding) 
and a second pass over S to encode each character in S using 
the assigned codeword. The first phase takes 0(m) time, the 
second O(nlogn) time and the third 0((H + 1 )m) time. 

For any static algorithm A, there is a simple dynamic 
algorithm that recomputes the code-tree from scratch after 
reading each character. Specifically, for i = 1... to: 

1) We keep a running count of the number of occurrences 
of each distinct character in the current prefix si • • • s,_i 
of S'. 

2) We compute the assignment of codewords to characters 
that would result from applying A to _Lsi • • • Sj_i, 
where _L is a special character not in the alphabet. 

3) If Si occurs in then we encode Si as the 

codeword Ci assigned to that character. 

4) If Si does not occur in S! ■ • • Sj_i, then we encode Si as 
the concatenation c, of the codeword assigned to _L and 
the binary representation of sf s index in the alphabet. 

We can later decode character by character. That is, we can 
recover si • • • Si as soon as we have received ci • • • Ci. To 
see why, assume that we have recovered Then 

we can compute the assignment of codewords to characters 
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that A used to encode Sj. Since A is prefix-free, cy is the only 
codeword in this assignment that is a prefix of a ■ ■ ■ c m . Thus, 
we can recover Sj as soon as cy has been received. This takes 
the same amount of time as encoding s % . 

Faller [5] and Gallager [6] independently gave a dynamic 
coding algorithm based on Huffman’s algorithm. Their algo¬ 
rithm is similar to, but much faster than, the simple dynamic 
algorithm obtained by adapting Huffman’s algorithm as de¬ 
scribed above. After encoding each character of S, their algo¬ 
rithm merely updates the Huffman tree rather than rebuilding it 
from scratch. Knuth [7] implemented their algorithm so that it 
uses time proportional to the length of the encoding produced. 
For this reason, it is sometimes known as Faller-Gallager- 
Knuth coding; however, it is most often called dynamic 
Huffman coding. Milidiu, Laber, and Pessoa [8] showed that 
this version of dynamic Huffman coding uses fewer than 2 to 
more bits to encode S than Huffman’s algorithm. Vitter [9] 
gave an improved version that he showed uses fewer than 
to more bits than Huffman’s algorithm. These results imply 
Knuth’s and Vitter’s versions use at most ( H + 2 + r)m + 
O(nlogn) and ( H + 1 + r)m + 0(n\ogn) bits to encode 
.S', but it is not clear whether these bounds are tight. Both 
algorithms use 0{{H + 1 )m) time. 

In this paper, we present a new dynamic algorithm, dynamic 
Shannon coding. In Section [HI we show that the simple 
dynamic algorithm obtained by adapting Shannon’s algorithm 
as described above, uses at most ( H + 1)to + 0(n log to) bits 
and 0{mn\ogn) time to encode S. Section Hill contains our 
main result, an improved version of dynamic Shannon coding 
that uses at most ( H + 1)to + 0(n\ogm) bits to encode S 
and only 0((H + 1 )m + nlog 2 to) time. The relationship 
between Shannon’s algorithm and this algorithm is similar 
to that between Huffman’s algorithm and dynamic Huffman 
coding, but our algorithm is much simpler to analyze than 
dynamic Huffman coding. 

In Section EH we show that dynamic Shannon coding can 
be applied to three related problems. We give algorithms for 
dynamic length-restricted coding, dynamic alphabetic coding 
and dynamic coding with unequal letter costs. Our algorithms 
have better bounds on the length of the encoding produced 
than were previously known. For length-restricted coding, no 
codeword can exceed a given length. For alphabetic coding, 
the lexicographic order of the codewords must be the same as 
that of the characters. 

Throughout, we make the common simplifying assumption 
that m > n. Our model of computation is the unit-cost word 
RAM with Oflog mj-bit words. In this model, ignoring space 
required for the input and output, all the algorithms mentioned 
in this paper use 0(|{a : a £ S'}!) words, that is, space 
proportional to the number of distinct characters in S. 

II. Analysis of Simple Dynamic Shannon Coding 

In this section, we analyze the simple dynamic algorithm 
obtained by repeating Shannon’s algorithm after each character 
of the string _Lsi • • • s m , as described in the introduction. Since 
the second phase of Shannon’s algorithm, assigning codewords 
to characters, takes 0(n log n) time, this simple algorithm uses 


0(mn log n) time to encode S. The rest of this section shows 
this algorithm uses at most (H + 1 )m + 0(n log to) bits to 
encode S. 

For 1 < i < to and each distinct character a that occurs 
in _Lsi • • ■ Sj_i, Shannon’s algorithm on _l_si ■ • • Sj_i assigns 
to a a codeword of length at most [log(i/^ a (_Lsi • • 

This fact is key to our analysis. 

Let R be the set of indices i such that sy is a repetition 
of a character in That is, R = {i : 1 < 

i < m, Si £ {si,..., Si— i}}. Our analysis depends on the 
following technical lemma. 

Lemma 1: 

Y log ( ——-—--- ] < Hm + 0(n log to) . 

ttn “ 

Proof: Let 

Notice that Eiefllog* < ES=il°S* = log(m!). Also, for 
i £ R, if Si is the y th occurrence of a in S, for some j > 2, 
then log# Si (si • ■ • s*_i) = log(j - 1). Thus, 

L = ^logi - ^]log# Si (si • • 
i£R i£R 

#u(S) 

< log(m!) - y y log (j - 1) 

a6S j =2 

= log(m\) -yiog(#a(S)\) + y iog#a(S) . 

aeS a£S 

There are at most n distinct characters in S and each occurs 
at most to times, so J2aeS log # a (5) £ 0(n log to). By 
Stirling’s Formula, 

2 log a: — x In 2 < log(a:!) < a; log a; — x In 2 + 0(loga;) . 

Thus, 


L < to log to — to In 2 — 

E (#« (*5) log *a (5) -#a(S) In 2) + 0(n log to) . 

aeS v 


Since E ae s#a(S') = TO, 


*<£*•(*)* Gras) 


+ 0(n\ogm) . 


By definition, this is Hm + Ofnlogm). ■ 

As an aside, we note EaeS 1°§ e o((if + l)m); 
to see why, compare corresponding terms in Eaes log #a(S) 
and the expansion 

(H+ 1 ) ’" = S # “ (S) ( los (#^)) +1 ) ■ 

Using Lemma0 it is easy to bound the number of bits that 
simple dynamic Shannon coding uses to encode S. 

Theorem 2: Simple dynamic Shannon coding uses at most 
(.H + 1 )to + O(nlogTO) bits to encode S. 

Proof: If .Sj is the first occurrence of that character in S 
(i.e., i £ m} — R), then the algorithm encodes sy as the 
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codeword for _L, which is at most [log to] bits, followed by 
the binary representation of sfs index in the alphabet, which 
is [log ri\ bits. Since there are at most n such characters, the 
algorithm encodes them all using O(nlogm) bits. 

Now, consider the remaining characters in S, that is, those 
characters whose indices are in R. In total, the algorithm 
encodes these using at most 


E 

ieR 


log 


(-Ls 1 * * * Sj_i) 


< 


TO + E log 
tefl 



bits. By Lemmas this is at most (H + 1 )m + O(nlogm). 

Therefore, in total, this algorithm uses at most (H + l)m + 
O(nlogm) bits to encode S. ■ 


III. Dynamic Shannon Coding 

This section explains how to improve simple dynamic 
Shannon coding so that it uses at most (H +1 )to+ 0(n log m) 
bits and 0((H + 1 )m + n log 2 m ) time to encode the string 
S - Si ■ ■ ■ s m . The main ideas for this algorithm are using 
a dynamic minimax tree to store the code-tree, introducing 
“slack” in the weights and using background processing to 
keep the weights updated. 

Gagie [10] showed that Faller’s, Gallager’s and Knuth’s 
techniques for making Huffman trees dynamic can be used 
to make minimax trees dynamic. A dynamic minimax tree T 
supports the following operations: 

• given a pointer to a node v, return v’s parent, left child, 

and right child (if they exist); 

• given a pointer to a leaf v, return v’s weight; 

• given a pointer to a leaf v, increment v’s weight; 

• given a pointer to a leaf v, decrement v’s weight; 

• and, given a pointer to a leaf v , insert a new leaf with 
the same weight as v. 

In Gagie’s implementation, if the depth of each node is 
bounded above by the negative of its weight, then each 
operation on a leaf with weight —di takes 0(di) time. Next, 
we will show how to use this data structure for fast dynamic 
Shannon coding. 

We maintain the invariant that, after we encode 
Si • • • s,_i, T has one leaf labelled a for each 
distinct character a in _Lsi • ■ • Sj_i and this leaf has 
weight between — flog((z + n)/# Q (_Lsi • • • Si_i))] and 
— [log(max(i,n)/# a (_Lsi • • • Sj_i))]. Notice that applying 
Shannon’s algorithm to _Lsi • ■ • S;_i results in a code-tree in 
which, for a £ _Lsi ■ • • Sj_i, the leaf labelled a is of depth 
at most [log(«/# Q (_Lsi • • -Si_i))]. It follows that the depth 
of each node in T is bounded above by the negative of its 
weight. 

Notice that, instead of having just i in the numerator, as 
we would for simple dynamic Shannon coding, we have at 
most i + n. Thus, this algorithm may assign slightly longer 
codewords to some characters. We allow this “slack” so that, 
after we encode each character, we only need to update the 
weights of at most two leaves. In the analysis, we will show 


that the extra n only affects low-order terms in the bound on 
the length of the encoding. 

After we encode Si, we ensure that T contains one 
leaf labelled Si and this leaf has weight — [log((z + 1 + 
n )/#si{-Lsi ■ ■ • Si))]. First, if Si is the first occurrence of that 
distinct character in S (i.e., i £ (1,..., m}—R), then we insert 
a new leaf labelled Si into T with the same weight as the leaf 
labelled _L. Next, we update the weight of the leaf labelled s,;. 
We consider this processing to be in the foreground. 

In the background, we use a queue to cycle through 
the distinct characters that have occurred in the current 
prefix. For each character that we encode in the foreground, 
we process one character in the background. When we 
dequeue a character a, if we have encoded precisely 
Si • ■ • Si, then we update the weight of the leaf labelled 
a to be — [log((z + 1 + n)/# Q (_l_Si ■ • • Sj))], unless it 
has this weight already. Since there are always at most 
n + 1 distinct characters in the current prefix (_L and 
the n characters in the alphabet), this maintains the 
following invariant: For 1 < i < m and a £ _Lsi • • • Si_i, 
immediately after we encode si • • • Sj_i, the leaf labelled 
a has weight between — [log((z + n)/# a (_Lsi • • • Si_i))] 
and — [log(max(z, n)/# a (_Lsi • • • Si_i))]. Notice that 
max(j,n) < i + n < 2max(z,n) and # a (si • ■ • Si_i) < 
#a(J-Si ■ * * Si) S ‘2^ a (^Sl ' ' ' Si_i) + 1. Also, if Si is 
the first occurrence of that distinct character in S, then 
# Si (-Lsi • ■ -Si) = #_l(_Lsi • ■-Si_i). It follows that, 
whenever we update a weight, we use at most one increment 
or decrement. 

Our analysis of this algorithm is similar to that in Section UTI 
with two differences. First, we show that weakening the bound 
on codeword lengths does not significantly affect the bound on 
the length of the encoding. Second, we show that our algorithm 
only takes 0((H+l)ni+n log 2 to) time. Our analysis depends 
on the following technical lemma. 

Lemma 3: Suppose / C Z + and |/| > n. Then 

y: log + log f^ + n log (max I + n) . 

<6/ \ Xi I ie i \ Xi J 

Proof: Let 


L = E lo § 

iei 


i + n 

Xi 



+E lo s 

te/ 



Let i i,..., i\i\ be the elements of /, with 0 < i\ < ■ ■ ■ < i\i\. 
Then ij + n < ij+ n , so 


E io g 


iei 


i + n 


= log 


< log 


(n grfe+rc)) (n 


.ff|/|-n + 1 (*J +n 


(njUii) (n/in+tv) 

(TI E" i i+n) (max / + n) r 

i-nfirvn 


= nlog(max/ + n) . 
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Therefore, 


L < lo § 

ie/ 


i 


Xi 


+ n log(max I + n) . 


Using Lemmas Q and 0 it is easy to bound the number of 
bits and the time dynamic Shannon coding uses to encode S, 
as follows. 

Theorem 4: Dynamic Shannon coding uses at most ( H + 
l)m + 0{n log to) bits and 0((H + l)m + n log 2 to) time. 

Proof: First, we consider the length of the encoding 
produced. Notice that the algorithm encodes S using at most 


E 

iefl 


log 


i + n 


< rn + ^ log 


ifcsi (-Lsi * * * ) 

i + n 


i£R 


(^1 * ' * &i— l) 


0(n log to) 
- 0(n\ogm) 


bits. By Lemmas U and 0 this is at most ( H + 1 )m + 
0(nlog m). 

Now, we consider how long this algorithm takes. We will 
prove separate bounds on the processing done in the fore¬ 
ground and in the background. 

If Si is the first occurrence of that character in S (i.e., 
i £ {1,..., rn } — R), then we perform three operations in the 
foreground when we encode s,: we output the codeword for 
_L, which is at most [log(*+n)] bits; we output the index of Si 
in the alphabet, which is [logn] bits; and we insert a new leaf 
labelled .s, and update its weight to be — flog(z + 1 + n)]. In 
total, these take 0(log(i + n)) C O(logm) time. Since there 
are at most n such characters, the algorithm encodes them all 
using O(nlogm) time. 

For i £ R, we perform at most two opera¬ 
tions in the foreground when we encode s£. we out¬ 
put the codeword for ,s,, which is of length at most 
[log((i + n)/# Si (si ■ ■ ■ Sj_i))]; and, if necessary, we incre¬ 
ment the weight of the leaf labelled ■s l . In total, these take 
O (log((i + n)/# Si (si • --Si- 1 ))) time. 

For 1 < i < to, we perform at most two operations in the 
background when we encode ,s t : we dequeue a character a; 
if necessary, decrement the weight of the leaf labelled a; and 
re-enqueue a. These take 0(1) time if we do not decrement 
the weight of the leaf labelled a and Oflog to) time if we do. 

Suppose Si is the first occurrence of that distinct character 
in S. Then the leaf v labelled s z is inserted into T with weight 
— |"log(i + n)]. Also, v’s weight is never less than — [log(m + 
1 + n)]. Since decrementing v’s weight from w to w — 1 or 
incrementing v’s weight from w — 1 to w both take O(-w) 
time, we spend the same amount of time decrementing v’s 
weight in the background as we do incrementing it in the 
foreground, except possibly for the time to decrease v’s weight 
from — [log(* + n)] to — [log(m + 1 + n)]. Thus, we spend 
0(log 2 to) more time decrementing v’s weight than we do 
incrementing it. Since there are at most n distinct characters 
in S, in total, this algorithm takes 


E° 

ieR 



ft^Si (^1 ’ ' * Si— i) 


0(n log 2 to) 


time. It follows from Lemmas Q and 0 that this is ()((U + 

l)TO + nlog 2 TO). ■ 

IV. Variations on Dynamic Shannon Coding 

In this section, we show how to implement efficiently 
variations of dynamic Shannon coding for dynamic length- 
restricted coding, dynamic alphabetic coding and dynamic 
coding with unequal letter costs. Abrahams [11] surveys static 
algorithms for these and similar problems, but there has 
been relatively little work on dynamic algorithms for these 
problems. 

We use dynamic minimax trees for length-restricted dy¬ 
namic Shannon coding. For alphabetic dynamic Shannon cod¬ 
ing, we dynamize Melhorn’s version of Shannon’s algorithm. 
For dynamic Shannon coding with unequal letter costs, we 
dynamize Krause’s version. 


A. Length-Restricted Dynamic Shannon Coding 

For length-restricted coding, we are given a bound and 
cannot use a codeword whose length exceeds this bound. 
Length-restricted coding is useful, for example, for ensuring 
that each codeword fits in one machine word. Liddell and 
Moffat [12] gave a length-restricted dynamic coding algorithm 
that works well in practice, but it is quite complicated and 
they did not prove bounds on the length of the encoding it 
produces. We show how to length-restrict dynamic Shannon 
coding without significantly increasing the bound on the length 
of the encoding produced. 

Theorem 5: For any fixed integer /: > 1, dynamic Shannon 
coding can be adapted so that it uses at most 2[log?r] -Vt bits 
to encode the first occurrence of each distinct character in S , 
at most [logn] +£ bits to encode each remaining character 

in S, at most ^H + 1 + ^- 1 ) 1112 ) m + O(nlogTO) bits in 
total, and 0((H + 1)to + nlog 2 to) time. 

Proof: We modify the algorithm presented in Section HTTl 
by removing the leaf labelled _!_ after all of the characters in the 
alphabet have occurred in S, and changing how we calculate 
weights for the dynamic minimax tree. Whenever we would 
use a weight of the form — [loga;], we smooth it by instead 
using 


log 


> — min 


{2 e — T)/x + l/n 
2 e x 


log 


- 1 


, [log? 


With these modifications, no leaf in the minimax tree is ever 
of depth greater than [log n\ + l. Since 


log 


2 e x 
2 e - 1 


< logx + l + 


log (l + 273 t) 
2 £ - 1 


2—1 


< 1 ° gX+1+ (2<- 1 l)l„2 . 

essentially the same analysis as for Theorem [4] shows this 
algorithm uses at most ^H + 1 + ^^- 1 )^ 2 ) m + 0(n\ogm) 
bits in total, and 0((H + 1 )m + ?rlog 2 to) time. ■ 
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It is straightforward to prove a similar theorem in which the 
number of bits used to encode s,; with i £ R is bounded above 
by [log(|{a : a E 5}| + 1)] + t + 1 instead of |~logn] + L 
That is, we can make the bound in terms of the number of 
distinct characters in S instead of the size of the alphabet. 
To do this, we modify the algorithm again so that it stores 
a counter m of the number of distinct characters that have 
occurred in the current prefix. Whenever we would use n in 
a formula to calculate a weight, we use 2 (rii + 1) instead. 


B. Alphabetic Dynamic Shannon Coding 

For alphabetic coding, the lexicographic order of the code¬ 
words must always be the same as the lexicographic order of 
the characters to which they are assigned. Alphabetic coding 
is useful, for example, because we can compare encoded 
strings without decoding them. Although there is an alphabetic 
version of minimax trees [13], it cannot be efficiently dy¬ 
namized [10]. Mehlhorn [14] generalized Shannon’s algorithm 
to obtain an algorithm for alphabetic coding. In this section, 
we dynamize Mehlhorn’s algorithm. 

Theorem 6 (Mehlhorn, 1977): There exists an alphabetic 
prefix-free code such that, for each character a 
in the alphabet, the codeword for a is of length 
^og ((m + ra)/# 0 (S))l + 1. 

Proof: Let ai,... ,a n be the characters in the alphabet 
in lexicographic order. For 1 < * < n. let 


/(at) 


{&) + 1 {S) + 1 

2 (to + n) ' m + n 

j =i 


For 1 < i i' < n, notice that | f(af) — /(<Ji')l > 
#a i (S) + l " 


2 (m+n) 


Therefore, the first 




+ 1 bits of 


the binary representation of f(a t ) suffice to distinguish it. Let 
this sequence of bits be the codeword for ctj. ■ 

Repeating Mehlhorn’s algorithm after each character of S, 
as described in the introduction, is a simple algorithm for 
alphabetic dynamic Shannon coding. Notice that we always 
assign a codeword to every character in the alphabet; thus, 
we do not need to prepend _L to the current prefix of S. This 
algorithm uses at most (H+2)m+0(n log to) bits and 0(mn) 
time to encode S. 

To make this algorithm more efficient, after encoding each 
character of S, instead of computing an entire code-tree, we 
only compute the codeword for the next character in S. We 
use an augmented splay tree [15] to compute the necessary 
partial sums. 

Theorem 7: Alphabetic dynamic Shannon coding uses 
(.H + 2)to + 0(n\ogm) bits and 0((H + 1 )m) time. 

Proof: We keep an augmented splay tree T and maintain 
the invariant that, after encoding si • • ■ , there is a node 

v a in T for each distinct character a in si..., s/_i. The node 
v a ’ s key is a; it stores a’s frequency in si • • • s/_i and the 
sum of the frequencies of the characters in iv’s subtree in T. 

To encode Sj, we use T to compute the partial sum 

frsi (Sr ’ ’ * ^2—1) 


E #“:>( 


sr 1 


i-l) , 


where aj < Si means that a :l is lexicographically less than s,. 
From this, we compute the codeword for s t , that is, the first 
+ 1 bits of the binary representation 


log( 

of 


#SiOl- 

#Si(s 1 


i— 1+n 


i)+i 
• Si r) + 1 


2(z — 1 + 7l) 


E 

CLn <Si 


^CLj (^1 * ‘ ’ S£_l) + 1 

i — 1 + n 


If Si is the first occurrence of that character in S (i.e., i G 
{1,..., to} — R), then we insert a node v Si into T. In both 
cases, we update the information stored at the ancestors of v Si 
and splay v Si to the root. 

Essentially the same analysis as for Theorem 0] shows this 
algorithm uses at most (H + 2 )m + O(nlogm) bits. By the 
Static Optimality theorem [15], it uses 0{{H + 1 )m) time. ■ 


C. Dynamic Shannon Coding with Unequal Letter Costs 

It may be that one code letter costs more than another. For 
example, sending a dash by telegraph takes longer than send¬ 
ing a dot. Shannon [1] proved a lower bound of Urn ln(2 )/C 
for all algorithms, whether prefix-free or not, where the 
channel capacity C is the largest real root of e — c o st (o) 
e —cost(i)-x _ i anc j e ~ 2.71 is the base of the natural 
logarithm. Krause [16] generalized Shannon’s algorithm for 
the case with unequal positive letter costs. In this section, we 
dynamize Krause’s algorithm. 

Theorem 8 (Krause, 1962): Suppose cost(0) and cost(l) 
are constants with 0 < cost(O) < cost(l). Then there exists a 
prefix-free code such that, for each character a in the alphabet, 
the codeword for a has cost less than _|_ cost(l). 

Proof: Let a-\ ..... ar be the characters in S in non¬ 
increasing order by frequency. For 1 < i < k, let 


/K) = E 

3=1 


# aj (s) 


< 1. 


Let b(a,i) be the following binary string, where xq = 0 and 
j/o = l: For j > 1, if /{af) is in the first e - cost (°) c fraction 
of the interval [xj-i, Vj-i), then the jth bit of 6(oj) is 0 and 
Xj and yj are such that [xj, yf) is the first e — cost ( 0 )-C' f rac tj 0 n 
of [xj-i, Vj-i). Otherwise, the jth bit of b(a,i) is 1 and Xj 
and ijj are such that [xj,yf) is the last e _cost (L' c ' fraction of 
[xj-i, Vj-i). Notice that the cost to encode the jth bit of b^af) 
is exactly ln 0r/j— 1 x 3-i)/(vi —Hill- j t follows that the total cost 
to encode the first j bits of b(fli) is _ 

For 1 < i 7 ^ %' < k, notice that /(a, : ) — /(a^) | > 
# ai (S)/m. Therefore, if yj — Xj < # ai {S)/m, then the 
first j bits of b(a,i) suffice to distinguish it. So the shortest 
prefix of b(a,i) that suffices to distinguish 6(aj) has cost less 
than H^ c c m/#am = MeM + cost( i). Let this 
sequence of bits be the codeword for ai. ■ 

Repeating Krause’s algorithm after each character of S, 
as described in the introduction, is a simple algorithm for 
dynamic Shannon coding with unequal letter costs. This 
algorithm produces an encoding of S with cost at most 
(-^r^ + cost(l)) to + O(nlogTO) in 0(mn ) time. 

As in Subsection Ewb] we can make this simple algorithm 
more efficient by only computing the codewords we need. 
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However, instead of lexicographic order, we want to keep 
the characters in non-increasing order by frequency in the 
current prefix. We use a data structure for dynamic cumulative 
probability tables [17], due to Moffat. This data structure stores 
a list of characters in non-increasing order by frequency and 
supports the following operations: 

• given a character a, return a’s frequency; 

• given a character a, return the total frequency of all 
characters before a in the list; 

• given a character a, increment a’s frequency; and, 

• given an integer k, return the last character a in the list 
such that the total frequency of all characters before a is 
at most k. 

If a’s frequency is a p fraction of the total frequency of all 
characters in the list, then an operation that is given a or 
returns a takes 0(log(l/p)) time. 

Dynamizing Krause’s algorithm using Moffat’s data struc¬ 
ture gives the following theorem, much as dynamizing 
Mehlhorn’s algorithm with an augmented splay tree gave 
Theorem 0 We omit the proof because it is very similar. 

Theorem 9: Suppose cost(O) and cost(l) are constants 
with 0 < cost(O) < cost(l). Then dynamic Shannon 
coding produces an encoding of S with cost at most 
( 2 + cost(l)) m + 0(n log to) in 0((fT + 1 )to) time. 

If cost(O) = cost(l) = 1, then C = 1 and Theorem [9] 
is the same as Theorem 0 We considered this special case 
first because it is the only one in which we know how to 
efficiently maintain the code-tree, which may be useful for 
some applications. 
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